[ENH] V1 → V2 API Migration - studies#1610
Conversation
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1610 +/- ##
==========================================
+ Coverage 52.04% 52.06% +0.02%
==========================================
Files 36 58 +22
Lines 4333 4965 +632
==========================================
+ Hits 2255 2585 +330
- Misses 2078 2380 +302 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Implementing def _check_fold_timing_evaluations( # noqa: PLR0913
self,
fold_evaluations: dict[str, dict[int, dict[int, float]]],
num_repeats: int,
num_folds: int,
*,
max_time_allowed: float = 60000.0,
task_type: TaskType = TaskType.SUPERVISED_CLASSIFICATION,
check_scores: bool = True,
) -> None:Final function signature: def list( # noqa: PLR0913
self,
limit: int | None = None,
offset: int | None = None,
status: str | None = None,
main_entity_type: str | None = None,
uploader: list[int] | None = None,
benchmark_suite: int | None = None,
) -> Any: |
Signed-off-by: rohansen856 <rohansen856@gmail.com>
geetu040
left a comment
There was a problem hiding this comment.
Good work. Just use the listing as suggested in #1575 (comment) which is already similar to what you have done.
|
@geetu040 I reviewed the specific changes needed and have a slight doubt in the pandas implementation. class StudiesAPI(ResourceAPI, ABC):
@abstractmethod
def list( # noqa: PLR0913
self,
limit: int | None = None,
offset: int | None = None,
status: str | None = None,
main_entity_type: str | None = None,
uploader: list[int] | None = None,
benchmark_suite: int | None = None,
) -> pd.DataFrame: ...and similarly i have to change the return object in xml_string = response.text
# Parse XML and convert to DataFrame
study_dict = xmltodict.parse(xml_string, force_list=("oml:study",))
# Minimalistic check if the XML is useful
assert isinstance(study_dict["oml:study_list"]["oml:study"], list), type(
study_dict["oml:study_list"],
)
assert (
study_dict["oml:study_list"]["@xmlns:oml"] == "http://openml.org/openml"
), study_dict["oml:study_list"]["@xmlns:oml"]
studies = {}
for study_ in study_dict["oml:study_list"]["oml:study"]:
# maps from xml name to a tuple of (dict name, casting fn)
expected_fields = {
"oml:id": ("id", int),
"oml:alias": ("alias", str),
"oml:main_entity_type": ("main_entity_type", str),
"oml:benchmark_suite": ("benchmark_suite", int),
"oml:name": ("name", str),
"oml:status": ("status", str),
"oml:creation_date": ("creation_date", str),
"oml:creator": ("creator", int),
}
study_id = int(study_["oml:id"])
current_study = {}
for oml_field_name, (real_field_name, cast_fn) in expected_fields.items():
if oml_field_name in study_:
current_study[real_field_name] = cast_fn(study_[oml_field_name])
current_study["id"] = int(current_study["id"])
studies[study_id] = current_study
return pd.DataFrame.from_dict(studies, orient="index")A total of 3 files would be affected: Can you please confirm my approach... After that i will update the PR. |
|
@rohansen856 yes sounds right |
Signed-off-by: rohansen856 <rohansen856@gmail.com>
|
Updated! Ready for review. |
geetu040
left a comment
There was a problem hiding this comment.
Almost fine, just complety remove _list_studies as well and replace _list_studies with api_context.backend.studies.list as the parameter for partial in list_studies. Hope I didnot confuse you, just search for the exact method names in code. Let me know if I am not clear enough.
Oh definitely! I prolly missed that in |
…list Signed-off-by: rohansen856 <rohansen856@gmail.com>
This reverts commit fd43c48.
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
…into studies-migration # Conflicts: # openml/_api/__init__.py # openml/_api/resources/base/resources.py # openml/_api/resources/study.py
|
|
||
|
|
||
| class StudyV1API(ResourceV1API, StudyAPI): | ||
| def list( # noqa: PLR0913 |
There was a problem hiding this comment.
I think we can split this into 3 functions for more readability:
- list()
- _build_url()
- _parse_list_xml()
check #1606 for reference
There was a problem hiding this comment.
Understood! will deparate the long list function into the said 3 functions with proper docstring. applying with next commit.
| @@ -0,0 +1,94 @@ | |||
| # License: BSD 3-Clause | |||
| from __future__ import annotations | |||
There was a problem hiding this comment.
I think it would be better to change the file name to "test_study" for consistency
There was a problem hiding this comment.
Agreed! applying with next commit.
There was a problem hiding this comment.
also in this case similarly, tests\test_study folder should be renamed to tests\test_studies.
cc @geetu040
There was a problem hiding this comment.
makes sense, but let's not do it here, that will make the file hard to review with visible changes
| assert all(studies_df["status"] == "active") | ||
|
|
||
| @pytest.mark.uses_test_server() | ||
| def test_list_pagination(self): |
There was a problem hiding this comment.
I don't think we need to test pagination here. These tests should only be specific for the API. It's better to leave this test on test_study_functions if it's there.
There was a problem hiding this comment.
there is actually no pagination test in test_study_functions. Implementing this here should be fine... LMK if do u think we need to remove it still...
tests/test_api/test_studies.py
Outdated
|
|
||
| def setUp(self) -> None: | ||
| super().setUp() | ||
| self.api = StudyV2API(self.http_client) |
There was a problem hiding this comment.
This is v2, you need to use
self.v2_client = self._get_http_client(
server="http://localhost:8001/",
base_url="",
api_key="",
timeout_seconds=self.timeout_seconds,
retries=self.retries,
retry_policy=self.retry_policy,
cache=self.cache,
)
and change the server to your local v2 server
There was a problem hiding this comment.
Understood!
replacing this:
self.api = StudyV2API(self.http_client)with this:
self.v2_client = self._get_http_client(
server="http://localhost:8001/",
base_url="",
api_key="",
timeout=self.timeout,
retries=self.retries,
retry_policy=self.retry_policy,
cache=self.cache,
)
self.api = StudyV2API(self.v2_client)
tests/test_api/test_studies.py
Outdated
| self.v2_api = StudyV2API(self.http_client) | ||
|
|
||
| @pytest.mark.uses_test_server() | ||
| def test_v1_v2_compatibility(self): |
There was a problem hiding this comment.
I think this should test that the output matches and follow the naming style mentioned here: #1575 (comment)
check #1603 for reference
tests/test_api/test_studies.py
Outdated
| # Both should have delete, tag, untag from base | ||
| for method in ["delete", "tag", "untag", "publish"]: | ||
| assert hasattr(self.v1_api, method) | ||
| assert hasattr(self.v2_api, method) |
There was a problem hiding this comment.
I think you need to add Fallback tests as mentioned here: #1575 (comment)
check #1603 for reference
There was a problem hiding this comment.
Understoo! will implement FallbackProxy and a test_list_fallback function that tests the FallbackProxy automatically falls back from V2 to V1 when V2 raises not supported. also in case of test_list_matches i think it should be marked with @pytest.mark.skip(reason="V2 list not yet implemented") as it currently throws OpenMLNotSupportedError...
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Metadata
Details
Stackend PR, Depends on #1576
This PR adds
Studies v2migration.A question:
Due to the pre commit hook i could not put 6 arguments in a function, so i had to workaround that with this instead:
openml_api\resources\studies.py (line 10-15)
I would like to confirm if this approach is correct or not. Raising a draft PR for now.